Multistage Temporal Difference Learning for 2048-Like Games
نویسندگان
چکیده
منابع مشابه
Temporal Difference Learning for Nondeterministic Board Games
We use temporal difference (TD) learning to train neural networks for four nondeterministic board games: backgammon, hypergammon, pachisi, and Parcheesi. We investigate the influence of two variables on the development of these networks: first, the source of training data, either learner-vs.self or learner-vs.-other game play; second, the choice of attributes used: a simple encoding of the boar...
متن کاملLearning to Play Board Games using Temporal Difference Methods
A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: (1) Learning by self-play, (2) Learning by playing against an expert program, and (3) Learning from viewing experts play against themselves. Although the third...
متن کاملDeep Reinforcement Learning for 2048
In this paper, we explore the performance of a Reinforcement Learning algorithm using a Policy Neural Network to play the popular game 2048. After proposing a modelization of the state and action spaces, we review our learning process, and train a first model without incorporating any prior knwoledge of the game. We prove that a simple Probabilistic Policy Network achieves a 4 times higher maxi...
متن کاملDual Temporal Difference Learning
Recently, researchers have investigated novel dual representations as a basis for dynamic programming and reinforcement learning algorithms. Although the convergence properties of classical dynamic programming algorithms have been established for dual representations, temporal difference learning algorithms have not yet been analyzed. In this paper, we study the convergence properties of tempor...
متن کاملPreconditioned Temporal Difference Learning
LSTD is numerically instable for some ergodic Markov chains with preferred visits among some states over the remaining ones. Because the matrix that LSTD accumulates has large condition numbers. In this paper, we propose a variant of temporal difference learning with high data efficiency. A class of preconditioned temporal difference learning algorithms are also proposed to speed up the new met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Computational Intelligence and AI in Games
سال: 2017
ISSN: 1943-068X,1943-0698
DOI: 10.1109/tciaig.2016.2593710